high-dimensional graphical model selection
High-Dimensional Graphical Model Selection Using \ell_1 -Regularized Logistic Regression
We focus on the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on 1- regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an1-constraint. Our framework applies to the high-dimensional setting, in which both the number of nodes p and maximum neighborhood sizes d are allowed to grow as a function of the number of observations n. Our main result is to estab- lish sufficient conditions on the triple (n, p, d) for the method to succeed in consistently estimating the neighborhood of every node in the graph simul- taneously. Under certain mutual incoherence conditions analogous to those imposed in previous work on linear regression, we prove that consistent neighborhood selection can be obtained as long as the number of observa- tions n grows more quickly than 6d6 log d 2d5 log p, thereby establishing that logarithmic growth in the number of samples n relative to graph size p is sufficient to achieve neighborhood consistency.
High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. We propose an efficient threshold-based algorithm for structure estimation based known as conditional mutual information test. This simple local algorithm requires only low-order statistics of the data and decides whether two nodes are neighbors in the unknown graph. Under some transparent assumptions, we establish that the proposed algorithm is structurally consistent (or sparsistent) when the number of samples scales as n Omega(J{min} {-4} log p), where p is the number of nodes and J{min} is the minimum edge potential. We also prove novel non-asymptotic necessary conditions for graphical model selection.
High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
Anandkumar, Animashree, Tan, Vincent, Willsky, Alan S.
We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. We propose an efficient threshold-based algorithm for structure estimation based known as conditional mutual information test. This simple local algorithm requires only low-order statistics of the data and decides whether two nodes are neighbors in the unknown graph. Under some transparent assumptions, we establish that the proposed algorithm is structurally consistent (or sparsistent) when the number of samples scales as n Omega(J_{min} {-4} log p), where p is the number of nodes and J_{min} is the minimum edge potential. We also prove novel non-asymptotic necessary conditions for graphical model selection.
100 top data science presentations
We've already published the top big data presentations on slideshare, as well as great Github list of public data sets, or top machine learning projects, or top R packages. We've asked our readers to share a list of top Data Science videos on YouTube. Here, we share a list of top data science presentations from VideoLectures.net. These presentations received 5 to 20 times fewer page views than those on Slideshare, because they are far more technical, and attract a different, truly technical audience. You can check the entire list here.